- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources3
- Resource Type
-
0003000000000000
- More
- Availability
-
30
- Author / Contributor
- Filter by Author / Creator
-
-
Aybat, Necdet Serhat (1)
-
Boloni, Ladislau (1)
-
Cutkosky, Ashok (1)
-
Fallah, Alireza (1)
-
Gurbuzbalaban, Mert (1)
-
Khodadadeh, Siavash (1)
-
Orabona, Francesco (1)
-
Ozdaglar, Asuman (1)
-
Shah, Mubarak (1)
-
#Tyler Phillips, Kenneth E. (0)
-
#Willis, Ciara (0)
-
& Abreu-Ramos, E. D. (0)
-
& Abramson, C. I. (0)
-
& Abreu-Ramos, E. D. (0)
-
& Adams, S.G. (0)
-
& Ahmed, K. (0)
-
& Ahmed, Khadija. (0)
-
& Aina, D.K. Jr. (0)
-
& Akcil-Okan, O. (0)
-
& Akuom, D. (0)
-
- Filter by Editor
-
-
Beygelzimer, A. (2)
-
Fox, E. (2)
-
Garnett, R. (2)
-
Larochelle, H. (2)
-
Wallach, H. (2)
-
Beygelzimer, A (1)
-
Fox, E (1)
-
Garnett, R (1)
-
Larochelle, H (1)
-
Wallach, H (1)
-
d' Alché-Buc, F (1)
-
d'Alché-Buc, F. (1)
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Wallach, H.; Larochelle, H.; Beygelzimer, A.; Fox, E.; Garnett, R. (Ed.)
-
Cutkosky, Ashok; Orabona, Francesco (, Advances in neural information processing systems)Wallach, H.; Larochelle, H.; Beygelzimer, A.; d'Alché-Buc, F.; Fox, E.; Garnett, R. (Ed.)Variance reduction has emerged in recent years as a strong competitor to stochastic gradient descent in non-convex problems, providing the first algorithms to improve upon the converge rate of stochastic gradient descent for finding first-order critical points. However, variance reduction techniques typically require carefully tuned learning rates and willingness to use excessively large "mega-batches" in order to achieve their improved results. We present a new algorithm, STORM, that does not require any batches and makes use of adaptive learning rates, enabling simpler implementation and less hyperparameter tuning. Our technique for removing the batches uses a variant of momentum to achieve variance reduction in non-convex optimization. On smooth losses $$F$$, STORM finds a point $$\boldsymbol{x}$$ with $$E[\|\nabla F(\boldsymbol{x})\|]\le O(1/\sqrt{T}+\sigma^{1/3}/T^{1/3})$$ in $$T$$ iterations with $$\sigma^2$$ variance in the gradients, matching the optimal rate and without requiring knowledge of $$\sigma$$.more » « less
-
Khodadadeh, Siavash; Boloni, Ladislau; Shah, Mubarak (, Advances in neural information processing systems)Wallach, H; Larochelle, H; Beygelzimer, A; d' Alché-Buc, F; Fox, E; Garnett, R (Ed.)
An official website of the United States government

Full Text Available